% tortoise commit point: 9

\documentclass{article}
\usepackage{amsmath}
\usepackage[letterpaper,left=.5in,right=.5in,top=.05in,bottom=.8in]{geometry}
\usepackage{natbib}

\begin{document}

\title{Models for Contingency Tables With Known Margins When Target and Sampled Populations Differ}


\author{\textsc{Roderick J. A. Little} and \textsc{Mei-Miau Wu}\thanks{Roderick J. A. Little is Professor, Department of Biomathematics, U.C.L.A. School of Medicine, Los Angeles, CA 90024. Mei-Miau Wu is Assistant Researcher, Western Human Nutrition Research Center, Ag- ricultural Research Service, U.S.D.A., P.O. Box 29997, San Francisco, CA 94129. This article summarizes part of Dr. Wu's Ph.D. dissertation (Wu 1987). Authors are listed alphabetically. Dr. Little's work was sup- ported by USPHS Grant MH 37188 from the National Institute of Mental Health. The authors particularly thank Dr. A. A. Afifi for his encour- agement and numerous helpful suggestions, and Rob Weiss, Alan Zas- lavsky, an associate editor, and two referees for useful comments.}}

\date{}

\maketitle

\begin{abstract}
The analysis of two-way contingency tables with known margins is considered. Four methods for estimating the cell probabilities are compared, namely, raking (RAKE), maximum likelihood under random sampling (MLRS), minimum chi-squared (MCSQ), and least squares (LSQ). Assuming random sampling from the target population, these methods are known to be asymptotically equivalent, and small-sample studies have suggested that MCSQ is slightly better than the other methods in average root mean squared error. We consider properties of the methods when the sampled population differs from the target population, through deficiencies in the sampling frame or defects in the implementation of the sample. We show that each method is in fact maximum likelihood for a particular model relating the target and sampled populations. Expressions for the standard errors of the estimates are developed under these alternative models. The methods are compared on data from a health survey and in a simulation study where each of the methods is assessed using data generated in a variety of ways. The results suggest that LSQ is inferior to the other three methods, and RAKE and MLRS dominate MCSQ.
\end{abstract}

\section{Introduction}

When analyzing $I \times J$ contingency tables, there are
situations where data furnished by a sample survey can be
adjusted for consistency with marginal distributions
obtained from other sources or by deduction from established
theory. For example, the table may be a two-way
classification by $A$ and $B$ obtained from a survey, with the
marginal distributions of $A$ and $B$ available from a census. Let
$\pi_{ij}$ denote the probability that $A = i$ and $B = j$ in the target population of interest, and let $\pi_{i+}$ and $\pi_{+j}$ denote the known marginal probabilities. Also let $p_{ij} = \frac{n_{ij}}{n}$ denote the sample cell proportion. The problem of interest is to find
estimators, $\hat\pi_{ij}$, of $\pi_{ij}$, by adjusting the sample cell proportions $p_{ij}$ so that
\begin{align}
 \sum_{j} \hat\pi_{ij} = \pi_{i+}, \quad & i = 1, \ldots, I;
 \notag \\
 \sum_{i} \hat\pi_{ij} = \pi_{+j},\quad  & j = 1, \; \ldots, J.
 \label{eq:intro:calibration:constraints}
\end{align}
This problem was first studied by \citet{deming:stephen:1940}. They proposed using as adjusted estimates of the cell probabilities $\{\pi_{ij}\}$ the values $\{\hat\pi_{ij}^{\rm LS}\}$ that minimize the
weighted least squares criterion,
\begin{align}
 \sum_{j} \sum_{i} \frac{(p_{ij}-\hat\pi_{ij})^{2}}{p_{ij}},
 \label{eq:intro:wls:criterion}
\end{align}
subject to the known marginal restrictions (\ref{eq:intro:calibration:constraints}). Using the
method of Lagrange multipliers, the least squares estimates
can be shown to take the form
\begin{equation}
 \frac{ \hat\pi_{ij}^{2}}{p_{ij}} = \hat\mu^{\rm LS} +
 \hat\alpha_{i}^{\rm LS} + \hat\beta_{j}^{\rm LS},
 \label{eq:intro:two:way}
\end{equation}
for suitable choices of $\hat\mu^{\rm LS}$, $\{\hat\alpha_{i}^{\rm LS}\}$,
and $\{\hat\beta_{j}^{\rm LS}\}$. Since there are
no direct solutions, Deming and Stephan proposed an interative
proportional fitting (IPF) method, the Deming-Stephan
algorithm, to find the estimators. Let $\{\hat\pi_{ij}^{(t)} \}$
be the estimates of $\{\pi_{ij}\}$ at the $t$th iteration,
and initially let $\{\hat\pi_{ij}^{(0)}\} = p_{ij}$ for all $i$, $j$.
The algorithm proceeds by row and column
adjustments, such at at iteration $t$:
\begin{align}
 \hat\pi_{ij}^{(t)} = \frac{\pi_{i+}\hat\pi_{ij}^{(t-1)}}
 {\hat\pi_{i+}^{(t-1)}}, \quad & \mbox{if $t$ is odd},
  \notag \\
 = \frac{\pi_{+j}\hat\pi_{ij}^{(t-1)}}
 {\hat\pi_{+j}^{(t-1)}} , \quad & \mbox{if $t$ is even},
 \label{eq:intro:iterations}
\end{align}
where $+$ denotes summation over the corresponding subscript.
This application of IPF to contingency tables with known margins is called $raking$.

\citet{stephan:1942} later pointed out that raking only approximates
the least squares estimates $\{\hat\pi_{ij}^{\rm LS}\}$ that minimize
(\ref{eq:intro:wls:criterion}). In particular, the final raking estimates, say $\{\hat\pi_{ij}^{\rm RK}\}$, can be shown to have the form
\begin{equation}
 \ln(\frac{\hat\pi_{ij}^{\rm RK}}{p_{ij}}) = \hat\mu^{\rm RK} + \hat\alpha_{t}
 ^{\rm RK}+ \hat\beta_{j}^{\rm RK}
\end{equation}
for suitable choices of $\mu^{\rm RK}$, $\{\hat\alpha_{i}^{\rm RK}\}$, and
$\{\hat\beta_{j}^{\rm RK}\}$, rather than the form (\ref{eq:intro:two:way}) of the least
squares estimates. Stephan presented an alternative algorithm that converges to the least squares estimate $\{\hat\pi_{ij}^{\rm LS}\}$.

Ireland and Kullback (1968) showed that the raking estimators $\{\hat\pi_{ij}^{\rm RK}\}$ in fact minimize the discrimination
information,
\begin{equation}
 I(\hat\pi; p) =  \sum_{t}  \sum_{j} \hat\pi_{ij}ln(\frac
 {\hat\pi_{ij}}{p_{ij}})
\end{equation}
subject to the same marginal restrictions (\ref{eq:intro:calibration:constraints}). Here
$\boldsymbol{\hat\pi}$ and {\bfseries p} denote the matrices
containing $\{\hat\pi_{ij}\}$ and $\{p_{ij}\}$ respectively.
They showed that raking converges when the observed proportions
are all nonzero and discussed the rate of convergence. They
also showed that raking estimates are not maximum likelihood (ML)
estimates of the cell probabilities when the observed data are a random sample from the target population, but that, like ML estimates, the raking estimates are consistent and best asymptotically normal.

Two other methods for estimating $\{\hat\pi_{ij}\}$ subject to (\ref{eq:intro:calibration:constraints}) have


\bibliography{ctacmo}
\bibliographystyle{chicago}

\end{document}

